人工神经网络通过反向传播培训极其深的网络成功解决了各种各样的问题。直接应用背部传播到尖峰神经网络含有生物学难以判断的组件,如重量运输问题或单独的推理和学习阶段。各种方法单独地解决不同的组件,但完整的解决方案保持无形。在这里,我们采取了一种替代方法,可以完全避免反向传播及其相关问题。深度学习的最新工作提出了通过信息瓶颈(IB)独立培训每层网络。随后的研究指出,该层面的方法绕过层的误差传播,导致生物合理的范式。不幸的是,使用一批样本来计算IB。先前的工作通过重量更新解决,仅使用两个样本(当前和先前的样本)。我们的工作通过将体重更新分解为本地和全局组件来采用不同的方法。本地组件是Hebbian,只取决于当前的样本。全局组件计算依赖于一批样本的层面调制信号。我们表明该调制信号可以通过具有像储存器的工作存储器(WM)的辅助电路来学习。因此,我们可以使用大于两个的批量尺寸,并且批处理大小确定了WM所需的容量。据我们所知,我们的规则是第一种生物合理的机制,可以直接与任务的WM耦合突触更新。我们评估我们对综合数据集和图像分类数据集的规则,如Mnist,我们探讨了WM容量对学习性能的影响。我们希望我们的工作是了解记忆在学习中的机制作用的第一步。
translated by 谷歌翻译
深入学习的成功已归功于大量数据培训大量的过度公正模型。随着这种趋势的继续,模型培训已经过分昂贵,需要获得强大的计算系统来培训最先进的网络。一大堆研究已经致力于通过各种模型压缩技术解决训练的迭代的成本,如修剪和量化。花费较少的努力来定位迭代的数量。以前的工作,例如忘记得分和宏伟/ el2n分数,通过识别完整数据集中的重要样本并修剪剩余的样本来解决这个问题,从而减少每时代的迭代。虽然这些方法降低了训练时间,但它们在训练前使用昂贵的静态评分算法。在计入得分机制时,通常会增加总运行时间。在这项工作中,我们通过动态数据修剪算法解决了这种缺点。令人惊讶的是,我们发现均匀的随机动态修剪可以以积极的修剪速率更优于现有的工作。我们将其归因于存在“有时”样本 - 对学习决策边界很重要的点,只有一些培训时间。为了更好地利用有时样本的微妙性,我们提出了基于加强学习技术的两种算法,以动态修剪样本并实现比随机动态方法更高的准确性。我们针对全数据集基线和CIFAR-10和CIFAR-100上的先前工作测试所有方法,我们可以将培训时间降低到2倍,而无明显的性能损失。我们的结果表明,数据修剪应理解为与模型的训练轨迹密切相关的动态过程,而不是仅基于数据集的静态步骤。
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Language models have become increasingly popular in recent years for tasks like information retrieval. As use-cases become oriented toward specific domains, fine-tuning becomes default for standard performance. To fine-tune these models for specific tasks and datasets, it is necessary to carefully tune the model's hyperparameters and training techniques. In this paper, we present an in-depth analysis of the performance of four transformer-based language models on the task of biomedical information retrieval. The models we consider are DeepMind's RETRO (7B parameters), GPT-J (6B parameters), GPT-3 (175B parameters), and BLOOM (176B parameters). We compare their performance on the basis of relevance, accuracy, and interpretability, using a large corpus of 480000 research papers on protein structure/function prediction as our dataset. Our findings suggest that smaller models, with <10B parameters and fine-tuned on domain-specific datasets, tend to outperform larger language models on highly specific questions in terms of accuracy, relevancy, and interpretability by a significant margin (+50% on average). However, larger models do provide generally better results on broader prompts.
translated by 谷歌翻译
Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.
translated by 谷歌翻译
Multi-document summarization (MDS) has traditionally been studied assuming a set of ground-truth topic-related input documents is provided. In practice, the input document set is unlikely to be available a priori and would need to be retrieved based on an information need, a setting we call open-domain MDS. We experiment with current state-of-the-art retrieval and summarization models on several popular MDS datasets extended to the open-domain setting. We find that existing summarizers suffer large reductions in performance when applied as-is to this more realistic task, though training summarizers with retrieved inputs can reduce their sensitivity retrieval errors. To further probe these findings, we conduct perturbation experiments on summarizer inputs to study the impact of different types of document retrieval errors. Based on our results, we provide practical guidelines to help facilitate a shift to open-domain MDS. We release our code and experimental results alongside all data or model artifacts created during our investigation.
translated by 谷歌翻译
Language tasks involving character-level manipulations (e.g., spelling correction, many word games) are challenging for models based in subword tokenization. To address this, we adapt the interchange intervention training method of Geiger et al. (2021) to operate on type-level variables over characters. This allows us to encode robust, position-independent character-level information in the internal representations of subword-based models. We additionally introduce a suite of character-level tasks that systematically vary in their dependence on meaning and sequence-level context. While simple character-level tokenization approaches still perform best on purely form-based tasks like string reversal, our method is superior for more complex tasks that blend form, meaning, and context, such as spelling correction in context and word search games. Our approach also leads to subword-based models with human-intepretable internal representations of characters.
translated by 谷歌翻译
In data-driven systems, data exploration is imperative for making real-time decisions. However, big data is stored in massive databases that are difficult to retrieve. Approximate Query Processing (AQP) is a technique for providing approximate answers to aggregate queries based on a summary of the data (synopsis) that closely replicates the behavior of the actual data, which can be useful where an approximate answer to the queries would be acceptable in a fraction of the real execution time. In this paper, we discuss the use of Generative Adversarial Networks (GANs) for generating tabular data that can be employed in AQP for synopsis construction. We first discuss the challenges associated with constructing synopses in relational databases and then introduce solutions to those challenges. Following that, we organized statistical metrics to evaluate the quality of the generated synopses. We conclude that tabular data complexity makes it difficult for algorithms to understand relational database semantics during training, and improved versions of tabular GANs are capable of constructing synopses to revolutionize data-driven decision-making systems.
translated by 谷歌翻译
Trajectory-User Linking (TUL) is a relatively new mobility classification task in which anonymous trajectories are linked to the users who generated them. With applications ranging from personalized recommendations to criminal activity detection, TUL has received increasing attention over the past five years. While research has focused mainly on learning deep representations that capture complex spatio-temporal mobility patterns unique to individual users, we demonstrate that visit patterns are highly unique among users and thus simple heuristics applied directly to the raw data are sufficient to solve TUL. More specifically, we demonstrate that a single check-in per trajectory is enough to correctly predict the identity of the user up to 85% of the time. Moreover, by using a non-parametric classifier, we scale up TUL to over 100k users which is an increase over state-of-the-art by three orders of magnitude. Extensive empirical analysis on four real-world datasets (Brightkite, Foursquare, Gowalla and Weeplaces) compares our findings to state-of-the-art results, and more importantly validates our claim that TUL is easier than commonly believed.
translated by 谷歌翻译
The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data. In this paper, we curate a large-scale and complementary dataset extending both the aforementioned ones by associating all objects mentioned in a referential sentence to their underlying instances inside a 3D scene. Specifically, our Scan Entities in 3D (ScanEnts3D) dataset provides explicit correspondences between 369k objects across 84k natural referential sentences, covering 705 real-world scenes. Crucially, we show that by incorporating intuitive losses that enable learning from this novel dataset, we can significantly improve the performance of several recently introduced neural listening architectures, including improving the SoTA in both the Nr3D and ScanRefer benchmarks by 4.3% and 5.0%, respectively. Moreover, we experiment with competitive baselines and recent methods for the task of language generation and show that, as with neural listeners, 3D neural speakers can also noticeably benefit by training with ScanEnts3D, including improving the SoTA by 13.2 CIDEr points on the Nr3D benchmark. Overall, our carefully conducted experimental studies strongly support the conclusion that, by learning on ScanEnts3D, commonly used visio-linguistic 3D architectures can become more efficient and interpretable in their generalization without needing to provide these newly collected annotations at test time. The project's webpage is https://scanents3d.github.io/ .
translated by 谷歌翻译